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iJ \ Abstract 

^^ ' Advice-efBcient prediction with expert advice (in analogy to label-efRcient prediction) is a variant of 

^^ ' prediction with expert advice game, where on each round of the game we are allowed to ask for advice 

^S) , of a limited number A4 out of A'' experts. This setting is especially interesting when asking for advice of 

every expert on every round is expensive. We present an algorithm for advice-efficient prediction with 
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expert advice that achieves O I W-jjTln A'' J regret on T rounds of the game 



1 Motivation 



We study the problem of prediction with expert advice in a setting, where we have a large set of experts A^ , 
but asking for advice of all experts on all rounds of the game is overly expensive. For example, the experts 
may be financial advisers designing investment portfolios for the stock market and getting the advice of each 
expert may be very expensive. Therefore, we would like to ask for advice of a smaller number M < N oi 
f~^ ■ experts on each round (generally, M <C N) , but still be close to the best we could get if we would ask all 

[*--■ I experts for their advices. We call this setti ng advice-efficient predict i on wi th expert advice in analogy to 

CO ■ label-efficient prediction with expert advice ( Cesa-Bianchi and Lugosil 120061 ) . 

cr^ ■ 2 Setting and Notations 



We work in prediction with expert advice setting ( Cesa-Bianchi and Lugosil 120061) . We denote the action 



space by X, the outcome space by y, and the loss function hy i : X x y —>■ [0,1] (for our analysis there is no 
need to assume that the loss is convex in the first parameter) . The number of experts is denoted by N and 
the experts are indexed by /i € {1, ... , N}. On each round i of the game each expert h produces an advice 
tpi G X. On each round the player is allowed to ask for advice of a fixed number M < A^ of experts. The 
player asks for advice and plays action Xi E X. The environment then reveals an outcome yt and the player 
suffers a loss £{Xi,yi) and the experts suffer losses £{ip'i^,yi). The goal of the algorithm is to minimize the 

regret defined as X^Li ^(^i' Vi) ^ ^^^^h f ZLi ^i^i^ Vi)) ■ 

3 Main Result 

We prove the following regret bound for the algorithm presented in Algorithm [1] box. 
Theorem 1. The expected regret of Algorithm]^ on T rounds of the game satisfies: 
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mm^^^(V;f,yO) <2^^TlnN. 



Algorithm 1: Advice-efBcient prediction with expert advice. 



V/i: Lo{h) = 0. 
for i = 1,2, ... do 
Let 

p-riiLi-i(h) 

qi{h) 



Sample one expert Hi according to qi . Get advice "0^ ' . 

PlayX, = i/;f\ 

Observe nature outcome yi and suffer loss Li — £{Xi, yi). 

Sample M — 1 additional experts uniformly without replacement. Let 1^ = 1 if expert h was 
sampled and 1^ = otherwise. (For Hi used in the definition of Xi we have if^' = 1.) 

Get advices ipf for the experts sampled. 

\fh:L'^=l{i,tyi)- ^ 



qi{h) + il-qiih)) 

i 
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end 



The "price" that we pay for observing the advice of M instead of all A'' experts is multiplicative \/ ^ term. 
The constant is identical to th e constant in the "simple" analysis of exponentially weighted forecasters in 



Cesa-Bianchi and Lugosi 



Cesa-Bianchi and Lugosi ( 



2006 . Corollary 2.2) and slightly worse than the constant in the tighter analysis in 



20061 Theorem 2.3) (we are loosing a V2 factor), but wc can improve the constant 



using similar techniques. 

4 Analysis 

The analysis is based on the following lemma, which follows from the analysis of EXP3 bv iBubeckl ( 2010r) . 



Lemma 2. For any N sequences of random variables Li,L2,... indexed by h £ {1,...,A'^}, such that 

Li > 0, and any non-increasing sequence rji,rj2, . ■ ■ , such that rji > 0, for qi{h) = — ^ ' I!r;'_i"'A^\ 

(assuming for i — 1 the sum in the exponent is zero), for all h* simultaneously we have: 



i=l 

Now we are ready to prove Theorem [1] 
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Proof of TheoremUi We study J2h 1ii^)-^i ^^^ J2h 9iW {^'i) ^o^' ^^^ ^^^^ ^^ '^'^^ algorithm. We have: 

^[L'i] =^(0f,2/O- 



And we have: 



We also have: 



E 



Y.'i.{h)Ll 



J2 q^im [Li] - Y. <l^ih)i{i>tv^) = E [L,] , 



(2) 



And from here: 
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ft(/i) + (l-g,W)f-i 
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q,{h){N-l) 



- E[l,''] 



gi(/i)(iV-Af) + M-l 



(3) 



The proof of the last inequality is provided in Lemma [3] the appendix. 

By taking expectations of the two sides of (HJ and substituting ^ and ^ we obtain for all h*: 
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N^T], \nN * 
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Y^i^f^y^)- 



Finally, taking rji — 



M In N 
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completes the proof. 
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5 Easy Extensions 

The following extensions are easy to show: 

1. Since the variance of if-s is bounded by [N — \)/{M — 1) independently of time, it is easy to derive 
a high-probability result with similar guarantees. 

2. It is easy to show that the algorithm and analysis can be extended to adversarial multiarmed bandits, 
where we are allowed to reveal the loss of more than one action on each round (reward games can be 
translated to loss games via the transformation i = 1 — r, where r e [0, 1] is the reward and I S [0, 1] 
is the loss). Specifically, assume that in adversarial multiarmed bandit game with K arms the player 
plays and suffers the loss of one action on each round, but then the player is allowed to observe the 
losses of M — 1 additional arms on the same round. Then, by identifying each arm with an expert that 

always predicts that arm, we can show that the regret of Algorithm [1] is O iJ ^T\nK\. Interestingly, 

for Af > 1 the variance of importance- weighted sampling is bounded by {K — 1)/{M — 1) for all game 
rounds and it is possible to derive high-probability guarantees without additional smoothing in contrast 
to the EXP3.P algorithm. 
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A Lemma [3] 

Lemma 3. For any probability distribution q on {1, . . . , N} and any M < N : 

N 

^ q{h){N - M) + Af - 1 - M 



Y '^W(^'i) <!L (4) 

^^ n(h\iN - M\ ^ M - 1 - A/f ^ ' 



Proof. First, we show that the maximum of ^ is attained by the uniform distribution q{h) = 1/N. The 
Lagrangian corresponding to minimization of ^ subject to ^^ q(h) — 1 is: 



^ q{h){N-l) ( ^ \ 



The first derivative of the Langrangian is: 

dC (N - l){q{h){N ~ M) + M - I) - q{h){N - M){N - I) {N - 1){M - I) 



dq{h) {q{h){N - M) + M - ly {q{h){N - M) + M - if 

The important point is that the derivative depends only on single h and, therefore, when we equate the 
derivative to zero the extremum is achieved when all q{h) are equal. And, as a result, they are equal to 1/N . 
The second derivative is: 

d'^C _ 2{N - M){N - 1){M - I) 
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dq{hf (q(h){N - M) + M - 1) 

(note that for M > 1 and N > M the inequality is strict; and for M = 1 or A^ = M it is easy to check that 
(|4|) holds) and the mixed partial derivatives g (h)d'(h') ~ ^- Therefore, q{h) = 1/N is the maximum point 
(for KM < N). Substituting q{h) = 1/N into ^ we get: 

A ^(A^-1) _ N{N-1) _ NjN-l) _ N 
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^^ i(7V-M) + M-l N-M + N{M-1) M{N-1) 
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